Anybody remotely sane that follows the news will have noticed a recent uptick in the past 4 or so years in leftist insanity regarding issues of so-called "social justice". I believe there is now a regular pattern of companies committing some faux pas according to the "social justice" left, usually pertaining to some percieved racism or sexism committed by the company or a reprasentative of the company. While it's true that only a small percentage of people actually believe in this far-left conception of "social justice", the true believers are for various reasons disproportionately represented in the media, meaning that these "faux-faux-pas" are often highlighted disproportionately in the news.
To complicate matters, we live in a society run amuck with corporate facism, by which I mean that corporations of certain sizes and industries recieve special government privilege, absolving the individuals responsible for them of legal responsibility for even blatant negligence and incompetence (aka impunity, see Equifax/Google/Facebook/etc). In this case, the justified outrage is often reported in the media as well.
My intuition is that these controversies can effect the market in a predictable way: when companies of a certain size or status (this will need a better definition) come under fire for faux or real controversy, the market responds emotionally. Angry/offended traders, or traders who believe many others will be angry/offended, undervalue the stock of the company under fire.
In the case of the "social justice" controversies, I predict that the company under fire takes a hit for superficial reasons, but their true underlying value is unnaffected. As I said previously, only a small portion of people genuinely believe in the far-left conception of "social justice", and I think that we can reasonably assume that an even smaller portion of those will actually change their purchasing behavior in response to said controversy. Therefore, the emotional reaction to the controversy will cause the stock to be undervalued and presents an opportunity for us to buy low and sell high.
In the case of corporate impunity controversies, my hypothesis is that the company under fire will again take a hit, this time due to genuine moral failings. However due to rampant corporatism, this short term dip in price will reliably be reversed, since these corporations are largely immune to their criminal or ethical failings and via government privilege will remain as valuable as before. Again this miscalculation by the market to recognize this impunity will be another opportunity to buy low and sell high.
Of course everything above is merely my own intuition and speculation. In order to determine whether this hypothesis is actually true, I have gathered data from a variety of recent controversies over the past few years, starting from the articles listed below. I verified the dates and circuimstances of each controversy through further research (those sources will be enumerated later alongside their relevant stock market data). Many thanks to some guy named Geoffery James who apparently has made corporate scandal recaps his forte.
In order to determine if there is any validity to my hypothesis, my strategy is to visualize the relevant stock price data around what I've determined to be the start time of each controversy. To do this, I will create graphs of relevant stock data 30 days, 90 days, and 180 days either side of the date of the controversy (simply ignoring weekends). These timeframes are chosen to give a picture of data over multiple relevant timeframes, and can be easily adjusted later if it's determined that different angles should be looked at. The adjusted close value is used.
The spreadsheet detailing my manual research (saved as CSV) can be be found under data/controversy_data.csv (or click link to download).
The stock market data is taken from the Alpha Vantage API, which is a freemium service that appears to get good reviews based on some cursory Google searches.
Each row of the CSV data is converted to a Controversy object defined in model/Controversy.py
Unfortunately for the free version of the Alpha Vantage service, they have an API call limit but leave that up for the user to guess at (idk what the strategy is here, maybe they're just trying to piss the user off enough to give up and pay for the premium service.). I've found that if I limit call volume to once per minute, I can safely assume I won't get locked out. In order to prevent my code from taking ~60-90 minutes to compute every time it's run, I've chosen to create a separate script to make all the API calls to gather the stock market, and then serialize the list of Controversy objects -- each including their relevant stock data -- into a data/controversies.pickle. This way, I can simply run this time-expensive script to gather the data once and quicken my iteration cycle for code doing the actual parsing and displaying of the data.
import pandas as pd
import pickle
from datetime import datetime
import model
from model.Controversy import Controversy
import matplotlib.pyplot as plt
from IPython.display import Markdown, display
def printmd(string):
display(Markdown(string))
PIK = "data/controversies.pickle"
# load list of controversies
with open(PIK, "rb") as f:
controversies = pickle.load(f)
# sort alphabetically
controversies.sort(key=lambda x: x.date)
%matplotlib inline
%pylab inline
import pylab
pylab.rcParams['figure.figsize'] = (16, 4) # set figure size
def gen_graph_output(controversy):
try:
display(Markdown("### " + controversy.company))
print("Relevant Stock(s): {}".format(controversy.stocks))
print("Date: {}".format(controversy.date.strftime("%Y-%m-%d")))
print("Summary: {}".format(controversy.summary))
if controversy.notes == controversy.notes: #hack-y way to check for nan vals
print("Notes: {}".format(controversy.notes))
print("Source: {}".format(controversy.source))
N_days = [7, 30, 90, 180]
for N in N_days:
controversy.get_N_day_plot(N)
plt.show()
print()
print()
except:
plt.show()
print("No Data")
print()
print()
pass
i = -1
i+=1
con = controversies[i]
gen_graph_output(con)
Bad data, ignore.
con.categories = ["ignore"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
There's no business argument for this incident. This is the only food safety based controversy in the list, so it's difficult to determine anything from this datapoint.
con.categories = ["food"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
This is clearly SJW rage. Opposite to my hypothesis, there is no market reaction.
con.categories = ["sjw", "offensive"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
There doesn't appear to be much to this incident. The story wasn't "SJW" outrage persay, if anything this is an outrage likely to be driven by conservatives. This does however fall into the broader category of "offensive" outrages, which I will define as an incident where no real crime was committed and no real business operation was demonstrated, people were just offended by ideas.
con.categories = ["conservative", "offensive"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
There appears to be a big dip after this incident. Although this isn't the typical social justice outrage, it's similar in character in that the outrage is an emotional reaction to a specific superficial detail that isn't directly related to the company's bottom line. Comcast does well because it's a state-backed monopoly, nobody chooses them for their superior customer service.
con.categories = ["offensive"]
con.is_promising = True
i+=1
con = controversies[i]
gen_graph_output(con)
Difficult to see a strong pattern in this data. That slight drop could be due to the controversy, however the stock was already on a similar downard slope shortly before.
con.categories = ["sex", "crime"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
There is a noticable dip several days after this controversy, although it's not immediate. Still there may be some promise here.
con.categories = ["workers"]
con.is_promising = True
i+=1
con = controversies[i]
gen_graph_output(con)
Couldn't get data on this one unfortunately.
con.categories = ["ignore"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
Nestle somehow went utterly untouched by this slavery controversy. Perhaps it's due to the fact that it was brought to light by the company itself (see article)? This one is puzzling to me.
con.categories = ["criminal"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
This incident appears to have caused a short term drop in price. Not as immediate as some of the others, but still of promise.
con.categories = ["impunity", "healthcare"]
con.is_promising = True
i+=1
con = controversies[i]
gen_graph_output(con)
There is a sharp downward spike soon after the report, near Sept 15. I investigated that, and it appears to be a regular drop that comes soon before Apple's annual September announcement (https://www.marketwatch.com/story/how-apples-stock-tends-to-trade-around-its-september-event-2016-09-06). That itself may be something to investigate further in a future study. On further thought, I decided to look at Apple's stock as well (see below).
con.categories = ["workers"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
There is a downward spike similar to Foxconn near Sept 15.
con.categories = ["workers"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
This appears to be a classic case of corporate impunity. WF took a short term blow before bouncing back up. (When is the last time a bank went out of business for fraud?)
con.categories = ["impunity", "fraud"]
con.is_promising = True
i+=1
con = controversies[i]
gen_graph_output(con)
Clear, sharp drop upon news of the hack, followed by predictable rise as people rediscover that large corporations are immune to gross negligence.
con.categories = ["impunity", "hack"]
con.is_promising = True
i+=1
con = controversies[i]
gen_graph_output(con)
Disney seems to have gone untouched by this controversy. On further thought, I decided to see if YouTube (GOOG) felt the blow (see below).
con.categories = ["sjw", "offensive"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
Nothing here either.
con.categories = ["sjw", "offensive"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
I have several Uber controversies in this analysis, just because they had so many juicy stories to pick from in the recent past. Unfortunately they're privately held, and their controversies don't appear to make any sort of dent in their large stakeholders. I'm leaving these in this inital analysis so we know that Uber isn't a good company to target (at least until they decide to go public). I'm going to ignore them in future analyses, however, because I believe that the proxies I've chosen aren't really proxies at all, and keeping them in would throw off what might be actually useful statistical results.
con.categories = ["ignore"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
Ignore.
con.categories = ["ignore"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
There's no noticable direct market response to this controversy.
con.categories = ["sjw", "offensive"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
Ignore.
con.categories = ["ignore"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
It looks like Fox's stock takes a hit directly in reponse to this controversy, as demonstrated by the fact that it reaches a local maximum on the day of the controversy. Unlike some of the other controversies, this controversy seems to have had a longer term impact on Fox's stock evaluation.
con.categories = ["sex", "offensive"]
con.is_promising = True
i+=1
con = controversies[i]
gen_graph_output(con)
This result really surprised me: there doesn't appear to be any significant in United's stock prince due to this widely publicized incident. There is a blip a few days afterward, but it looks difficult to argue that it's distinguishible from the normal turbulence (pun intended sue me) seen previous to and since the controversy.
con.categories = ["offensive"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
It's possible that there's a slight downturn after this incident but if so, it's insignificant. I'm classifying this "sjw" because it's the sort of controversy that the political left would be most concerned with.
con.categories = ["sjw", "politics", "congress"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
Ignore.
con.categories = ["ignore"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
Ignore.
con.categories = ["ignore"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
I expected there to be an obvious response in the stock prince due to this incident but frankly there isn't. It could perhaps be argued that the sharp minima in the week after the controversy is directly related, but it appears insignificantly small and difficult to distinguish from normal market turbulence.
con.categories = ["sjw", "offensive"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
There's arguably a downswing shortly after this controversy, although it's not obvious that it's directly related. Using an ETF proxy seems inherently very risky for short term speculation, because other stocks in the fund could be behaving in direct opposition to whatever you're tracking. Not a good lead for that reason.
con.categories = ["criminal", "impunity"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
It would be difficult to argue that it's in direct relation to this controversy, however the nosedive a couple of weeks later may warrant further investigation.
con.categories = ["sjw", "politics"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
Very dramatic obvious overeaction to this hack, followed by a correction upward. Promising.
con.categories = ["hack", "impunity"]
con.is_promising = True
i+=1
con = controversies[i]
gen_graph_output(con)
Nothing of particular interest here.
con.categories = ["workers"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
There's no dramatic response to this controversy. It should be noted that this is news that came out in 2017 about an incident that happened in 2013. As of writing this, I haven't directly explored the 2013 incident, however have taken a note that it should be explored.
con.categories = ["hack", "impunity"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
Again Google doesn't appear to take any sort of hit due to this YouTube based controversy. It could be that my SJW hypothesis is incorrect, it could also be that because YouTube represents only a small portion of Google's overall value (see Notes section above), it's controversies don't adversly affect the overall stock price.
con.categories = ["offensive"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
This controversy resides in a category of its own, since it's not so much corporate impunity or a response to a social/political issue only deeply cared about by a small portion of the population, but rather what appears to have been an overwhelming majority response to shitty and manipulative business design. It don't think it pertains to my original hypothesis so I'm marking it as not promising.
con.categories = ["gaming"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
Ignore.
con.categories = ["ignore"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
There appears a sharp dip in Apple price in direct response to this controversy.
con.categories = ["hack", "impunity"]
con.is_promising = True
i+=1
con = controversies[i]
gen_graph_output(con)
The gradual drop in the days after this controversy could be due to it, although it's not so obvious as some other minima in this analysis. Still, somewhat promising I looked into those two obvious drops apparent in the months after the controversy: the one in early February appears to be related to iPhone X sales numbers (https://www.zacks.com/stock/news/291358/heres-why-apple-aapl-stock-is-recovering-today), whereas the one in late April appears to be related to Morgan Stanley saying the company's iPhone sales for the June quarter will disappoint Wall Street (https://www.cnbc.com/2018/04/20/us-stock-futures-dow-data-earnings-tech-and-politics-on-the-agenda.html)
con.categories = ["impunity"]
con.is_promising = True
i+=1
con = controversies[i]
gen_graph_output(con)
There's no noticable direct market response to this controversy.
con.categories = ["sex", "offensive"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
Ignore.
con.categories = ["ignore"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
There's a strong, immediate market response to this controversy.
con.categories = ["sjw", "offensive"]
con.is_promising = True
i+=1
con = controversies[i]
gen_graph_output(con)
There's no noticable direct market response to this controversy, there's something of a dip but it would be difficult to argue that it's distinguishable from normal turbulence.
con.categories = ["politics", "congress"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
This happened the same day as Sandberg's testimony. In Twitter's case there appears to be a drop more directly related to the controversy however there is no bounce back so it wouldn't have been a profitable buy. Both companies have dramatic cliffs at the end of July, which I thought could have been related to an announcement of the testimony date or something of that sort. Upon further investigation it looks like it's instead related to (separate) metric reports by both companies (https://www.fool.com/investing/2018/07/28/tech-stocks-this-week-facebook-and-twitter-plummet.aspx)
con.categories = ["politics", "congress"]
con.is_promising = False
i+=1
con = controversies[i]
gen_graph_output(con)
This again was the same day as Sandberg and Dorsey's testimonies, with the same result.
con.categories = ["politics", "congress"]
con.is_promising = False
One of the most difficult aspects to this project was determining which category to classify each controversy under. There's no perfectly objective way to choose categories, so I had to make vaguer judgement calls to create categories that I thought would be relevant for a meta analysis. The categories are of course debatable and I'm open to any feedback for how my categorization could be improved. There are also other ways that this could be categorized such as by industry or company-size; the domain of interest here is controversies, however, so I chose to categorize based on controversy traits.
We can visualize the categories below.
import importlib
#importlib.reload(model.Category)
from model.Category import Category
# dict for holding categories with category_name key
# dict makes it easier to build the list of Category objects
cats = {}
# Create cats dictionary
for con in controversies:
for cat in con.categories:
category = cats.get(cat)
if category is None:
category = Category(cat)
cats[cat] = category
category = cats[cat]
category.add(con)
# flatten cats dictionary into list of Category
cats = [cats[k] for k in cats]
for cat in cats:
print(cat.category)
In my understanding of the meaning of each category, each of these is distinct and relevant. I figure that most of their definitions are self evident so haven't gone to great lengths to define them explicitly. The ones I predict might cause confusion are the difference between the "offensive" category (which is particularly relevant to my original hypothesis about faux outrages) and "sjw". To quote my explanation from the target controversy, "offensive" is a controversy where
no real crime was committed and no real business operation was demonstrated, people were just offended by ideas.
The "sjw" category pertains to which region of the political spectrum would likely find the controversy more relevant, if I deemed the controversy to be politically relevant at all.
The other way I categorized controversies was by whether or not I determined them to be promising based on the stock response to the controversy.
Whether or not to consider a controversy "promising" or not also presented a massive difficulty. Given that the idea will only be profitable if I could predict from the controversy itself that the stock will fall, I was looking for any data where the stock price appeared to fall in direct response to the controversy. This still begs the question of how to tell whether it's the controversy causing the fall or some other thing, and I don't have a great answer to how to determine that. I simply made a judgement call based on the visual aid of the graph -- local maxima on the day of the controversy = promise, sharp decline following the controversy = even more promise (since it will be easiest to model dramatic responses), uptick afterware = yet more promise (since that's where the money is). Any of my classifications are open to debate and revision.
The obvious first step for determining whether my categorizations are relevant to promising market opportunities is to examine whether any individual categories show significantly more promise than any of the others. The code below puts the categories into a table and calculates the percentage of controversies in each category that looks promising. I drop any categories with < 2 entries and the "ignore" category.
# set up lists that will become columns in our new DataFram
category = []
total = []
num_promising = []
num_unpromising = []
promising = []
unpromising = []
perc_promising = []
# build lists
for cat in cats:
category.append(cat.category)
total.append(len(cat.promisings)+len(cat.unpromisings))
num_promising.append(len(cat.promisings))
num_unpromising.append(len(cat.unpromisings))
promising.append(cat.promisings)
unpromising.append(cat.unpromisings)
perc_promising.append(len(cat.promisings)/(len(cat.promisings)+len(cat.unpromisings))*100)
# create dataframe
df = pd.DataFrame({
"category" : category, # name of category
"total" : total, # total controversies in the category
"num_promising" : num_promising, # number of promising controversies in the category
"num_unpromising" : num_unpromising, # number of unpromising controversies in the category
"promising" : promising, # list of promising Controversy
"unpromising" : unpromising, # list of unpromising Controversy
"perc_promising" : perc_promising # percent promsing
})
# drop categories with < 2 controversies, and ignore
df = df[df.total > 1]
df = df[df.category != "ignore"]
# display
df.sort_values("perc_promising")[["category", "total", "num_promising", "num_unpromising", "perc_promising"]]
def diplay_cons(list_of_cons):
for con in list_of_cons:
display(Markdown("##### " + con.company))
print("Relevant Stock(s): {}".format(con.stocks))
print("Date: {}".format(con.date.strftime("%Y-%m-%d")))
print("Summary: {}".format(con.summary))
if con.notes == con.notes: #hack-y way to check for nan vals
print("Notes: {}".format(con.notes))
print("Source: {}".format(con.source))
con.get_N_day_plot(90)
plt.show()
diplay_cons(df[df["category"] == "sjw"].promising.values[0])
diplay_cons(df[df["category"] == "sjw"].unpromising.values[0])
The results aren't much better for the other category relevant to that hypothesis, "offensive", where only 3 of the 12 controversies showed any promise. (Note: these categories are overlapping; the results aren't promising so I'm not going into any detail as to how):
diplay_cons(df[df["category"] == "offensive"].promising.values[0])
It may be of interest that 2 of the 3 promising controversies - Papa John's and Comcast - were a result of offensive language (language could've been it's own category). This is obviously only a tiny amount of data, but it could warrant further investigation to see if more offensive language based controversies have a predictable market reaction.
On the other hand, the PewDiePie controversy was about offensive language, but doesn't appear to have affected Google's stock (he's a YouTube star). Google being as massive as it is, and PewDiePie not even being a real employee could mean that we simply need to be more specific. Both the promising language controversies were regarding language directly from an employee.
diplay_cons(df[df["category"] == "offensive"].unpromising.values[0])
df.sort_values("perc_promising")[["category", "total", "num_promising", "num_unpromising", "perc_promising"]]
While the sjw hypothesis is clearly false, my instincts regarding corporate impunity appear to have more potential. 62% of "impunity" controversies and 75% of "hack" controversies appear promising. It should be noted that all 4 "hack" controversies overlap with impunity, so I will analyze them as seperate groups of "non-hack" and "hack".
def remove_hacks(nproms):
"""function to remove the controversies labeled "hack" from a list of controversies"""
for n in nproms:
if "hack" in n.categories:
nproms.remove(n)
proms = df[df["category"] == "impunity"].promising.values[0]
unproms = df[df["category"] == "impunity"].unpromising.values[0]
# remove the hacks
remove_hacks(proms)
remove_hacks(unproms)
df.loc[df["category"] == "impunity", "total"] = len(unproms) + len(proms)
df.loc[df["category"] == "impunity", "num_promising"] = len(proms)
df.loc[df["category"] == "impunity", "num_unpromising"] = len(unproms)
df.loc[df["category"] == "impunity", "perc_promising"] = (len(proms)/(len(unproms) + len(proms))) * 100
df.sort_values("perc_promising")[["category", "total", "num_promising", "num_unpromising", "perc_promising"]]
diplay_cons(df[df["category"] == "impunity"].promising.values[0])
diplay_cons(df[df["category"] == "impunity"].unpromising.values[0])
There looks to be some predictability to non-hack-related, corporate impunity based controversies. It's important to point out that the one stock that isn't so promising is actually just an ETF proxy for Samsung's stock (see "Notes" above). It also could be argued that the Samsung controversy is promising, I marked it unpromising due to the fact that ETF's would be very difficult to predict given that they're affected by a wide variety of stocks.
Two of the promising controversies could be considered "fraud" which could have been it's own category.
Also of note is that the market responses don't follow an obvious consistent shape. Cigna and Apple have a triangle wave, while Wells Fargo is closer to a square wave. All of them have gradual drops, in comparison to the data below.
diplay_cons(df[df["category"] == "hack"].promising.values[0])
diplay_cons(df[df["category"] == "hack"].unpromising.values[0])
Here are the results that are the most interesting to me. Three of the four hack controversies appear to have market responses. Additionally, the responses appear immediate, in a way that appears to me to be more predictable than the variety of response slopes I think I observe in the "impunity" hypothesis. Of the three promising controversies, the two controversies with the most obvious potential to be profitable, Equifax and Yahoo, are direct responses to data breaches, whereas the Apple story (which looks less profitable) was just a response to the potential of a hack.
Also of note, is that the unpromising controversy in this category isn't direct news of a data breach, but rather an update on a 2013 databreach 4 years later. I didn't include the 2013 breach in my original data, however I can manually construct it here in order to take a look at it. I also manually constructed a couple other Yahoo hack stories that I found during this research:
from generate_and_save_Controversy_list import date_to_datetime
yahoo2013 = Controversy("Yahoo", ["AABA"], date_to_datetime("1/7/13"), "Yahoo 2013 Mail Hack", "https://thenextweb.com/insider/2013/01/07/yahoo-mail-users-hit-by-widespread-hacking-xss-exploit-seemingly-to-blame/", None)
yahoo2014 = Controversy("Yahoo", ["AABA"], date_to_datetime("1/30/14"), "Yahoo 2014 Mail Hack", "https://www.yahoo.com/news/yahoo-email-account-passwords-stolen-002044026--finance.html", None)
yahoo2016 = Controversy("Yahoo", ["AABA"], date_to_datetime("9/22/16"), "Yahoo reports another 2014 Mail Hack", "https://en.wikipedia.org/wiki/Yahoo!_data_breaches#Late_2014_breach", None)
diplay_cons([yahoo2013, yahoo2014, yahoo2016])
These results are less promising than I had hoped for. The 2013 breach appears to have had no impact on the company's stock whatsoever. The 2014 hack, which was reported at that time, happened when the company's stock was already dropping. Had it happened maybe a day or so before, where that sharp drop off ocurred, I would consider it more promising. I checked my date quite thoroughly and think that data is accurate. This could be evidence of some sort of insider trading going on, but that's extremely speculative. The 2016 story is actually news of a 2014 hack, however it's the first time that hack was reported. There is some promise in that controversy, givem how immediate the drop off is afterwards.
To summarize, my sjw hypothesis was obviously incorrect, however my corporate impunity hypothesis appears to have more truth to it. One way my sjw hypothesis may turn out to be interesting is if I specifically study "language" based controversies, although I worry that these are too few and far in between to find a good dataset.
Of the corporate impunity controversies, it appears corporate hacks have particular promise, although the data from the 2013 and 2014 Yahoo hacks have somewhat mitigated my enthusiasm. Something that could be true is that data privacy issues have become more influential in the market only recently, as the public has become more aware of the problem.
The main risks associated with this entire analysis are:
There are really no solutions I can imagine that would solve these issues, they appear inherent to this type of market analysis. It's clear to me now why so many hedge funds are bullshit machines -- markets and the events that shape them are very, very difficult to categorize. Political and social climates are constantly in flux, and so even if you can predict the market in one time and place, it's nearly impossible to determine whether it will behave the same way in the future.
It would be great if this idea were to make money, and it's apparent in my analysis that I'm hoping that's the case. There are plenty of ways to further massage categories and speculate on responses to make you think you have a lead, and there's no doubt I have done that to some extent in this analysis. Irrespective of potential future findings, we should consider that this is an intrinsically highly risky venture when deciding on whether to pursue it.
All that being said, there may be enough here to warrant future investigations into data hacks in particular. This is the most promising avenue to me because of the data hacks that do look profitable,
A cursory Google search led me to this page, which contains data on what must be nearly all of the major hack stories going back to 2004. This is a dataset where we could run a more rigorous study, with more sophisticated statistical analysis, to help determine if data hacks are worth investing in. For example, it could help clarify whether or not there's indeed a temporal element to the market response (i.e. is this only a recent phenomenon post 2015 or 2016?). I could classify the stories by whether or not they are new stories about a hack vs updates to an old story about a hack, the time between the story and the hack itself, whether data was actually stolen or just suspected to be vulnerable, etc. A lot more certainty could be gleaned from such a study, however I'm frankly unsure if the data presented here justifies a further investment.